Extracting data

Before you start

Data source settings

Data source settings must be made beforehand, not only to make sure that the data is properly read but also to have it organized in a record structure that meets the purpose of the data mapping configuration (see Data source settings). It is important to set the boundaries before starting to extract data, especially transactional data (see Extracting transactional data). Boundaries determine which data blocks - lines, pages, nodes - form a record in the source data. Data that are located in different records cannot be put into the same record in the record set that is the result of the extraction workflow.

Preprocessor step

The Preprocessor step allows the application to perform actions on the data file itself before it is handed over to the Data Mapping workflow. In addition, properties can be defined in this step. These properties may be used throughout the extraction workflow. For more information, see Preprocessor step.

Adding an extraction

In an extraction workflow, Extract steps are the pieces that take care of the actual data extractions.
To add an Extract step:

  1. In the Data Viewer pane, select the data that needs to be extracted. (See Selecting data.)
  2. Choose one of two ways to extract the selected data.
    • Right-click on the selected data and select Add Extraction from the contextual menu.
      For optimization purposes, it is better to add data to an existing Extract step than to have a succession of extraction steps. To do that, select that step on the Steps pane first; then right-click on the selected data and choose Add Extract Field.
    • Alternatively, drag & drop the selected fields into the Data Model pane.
      In a PDF or Text file, use the Drag icon to drag selected data into the Data Model.
      With this method, a new Extract step will only be added to the extraction workflow when no Extract step already present on the Steps pane. Otherwise the field/s will be added to the selected Extract step or to the one that was last added.
      Dragging data into an existing field in the Data Model will replace the data. The field name stays the same.
      Drop data on empty fields or on the record itself to add new fields.

Special conditions

The Extract step may need to be combined with another type of step to get the desired result.

Fields cannot be used twice in one extraction workflow.
Different Extract steps can only write extracted data to the same field in the Data Model, if:

  • The field name is the same. (See: Renaming and ordering fields.)
  • The Extract steps are mutually exclusive. This is the case when they are located in different branches of a Condition step or Multiple Conditions step.
  • The option Append values to current record is checked in the Step properties pane under Extraction Definition.

Extracting data into multiple fields

When you select multiple fields in a CSV or tabular data file and extract them simultaneously, they are put into different fields in the Data Model automatically.
In a PDF or Text file, when multiple lines are extracted at the same time, they are by default joined and put into one field in the Data Model. To split them and put the data into different fields:

  1. Select the field in the Data Model that contains the extracted lines.
  2. On the Step properties pane, under Field Definition, click the drop-down next to Split and select Split lines.

Adding fields to an existing Extract step

For optimization purposes, it is better to add fields to an existing Extract step than to have a succession of extraction steps.

To add fields to an existing Extract step:

  1. In the Data Viewer pane, select the data that needs to be extracted. (See Selecting data.)
  2. Select an Extract step on the Steps pane.
  3. Right-click on the data and select Add Extract Field, or drag & drop the data on the Data Model.

When data are dropped on the Data Model, they are by default added to the last added Extract step.

Editing fields

After extracting some data, you may want to:

  • Change the names of fields that are included in the extraction.
  • Change the order in which fields are extracted.
  • Set the data type, data format and default value of each field.
  • Modify the extracted data through a script.
  • Delete a field.

All this can be done via the Step properties pane (see Extract step properties), because the fields in the Data Model are seen as properties of an Extract step. See also: Fields.

Testing the extraction workflow

The extraction workflow is always performed on the current record in the data source. When an error is encountered, the extraction workflow stops, and the field on which the error occurred and all subsequent fields will be greyed out. Click the Messages tab (next to the Step properties pane) to see any error messages.

To test the extraction workflow on all records, you can:

  • Click the Validate All Records toolbar button.
  • Select Data > Validate Records in the menu.

If any errors are encountered in one or more records, an error message will be displayed. Errors encountered while performing the extraction workflow on the current record will also be visible on the Messages tab.

 
  • Last Topic Update: 09:07 AM Jun-15-2017
  • Last Published: 2019-05-22 : 2:51 PM